{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "gpuType": "T4",
      "machine_shape": "hm"
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "# Learn OpenAI Whisper - Chapter 9\n",
        "## Notebook 3 - PVS step 2: Fine-tuning personalized voice synthesis (PVS) models with Deep Learning Art School (DLAS)\n",
        "\n",
        "This notebook complements the book [Learn OpenAI Whisper](https://a.co/d/1p5k4Tg).\n",
        "\n",
        "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1qKflIgjPFVDW3qLaL08CV-quth5MwcRd)\n",
        "\n",
        "This notebook fine-tunes a VPS model using the DLAS toolkit. It is based on the Tortoise fine-tuning with DLAS project by James Betker (https://github.com/152334H/DL-Art-School).  The original batch size was 128 but changed to 90 since colab free tier only gives you an NVIDIA T4, a 128 batch size used 16GB of VRAM."
      ],
      "metadata": {
        "id": "YA-Mk63N0ac9"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## 1. Checking the GPU:\n",
        "The code first checks if an NVIDIA GPU is available using the `nvidia-smi` command. It prints out the GPU information if connected."
      ],
      "metadata": {
        "id": "UjvK4JWKdYqQ"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "gpu_info = !nvidia-smi\n",
        "gpu_info = '\\n'.join(gpu_info)\n",
        "if gpu_info.find('failed') >= 0:\n",
        "  print('Not connected to a GPU')\n",
        "else:\n",
        "  print(gpu_info)"
      ],
      "metadata": {
        "id": "Z6kCeDNrdWM7",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "07f3babb-c72f-423a-c6bd-c0d343f98232"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Thu Apr 11 22:47:46 2024       \n",
            "+---------------------------------------------------------------------------------------+\n",
            "| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |\n",
            "|-----------------------------------------+----------------------+----------------------+\n",
            "| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |\n",
            "| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |\n",
            "|                                         |                      |               MIG M. |\n",
            "|=========================================+======================+======================|\n",
            "|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |\n",
            "| N/A   61C    P8              11W /  70W |      0MiB / 15360MiB |      0%      Default |\n",
            "|                                         |                      |                  N/A |\n",
            "+-----------------------------------------+----------------------+----------------------+\n",
            "                                                                                         \n",
            "+---------------------------------------------------------------------------------------+\n",
            "| Processes:                                                                            |\n",
            "|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |\n",
            "|        ID   ID                                                             Usage      |\n",
            "|=======================================================================================|\n",
            "|  No running processes found                                                           |\n",
            "+---------------------------------------------------------------------------------------+\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## 2. Checking virtual memory:\n",
        "It then checks the available RAM on the runtime using the `psutil` library. It prints a message if using a high-RAM runtime."
      ],
      "metadata": {
        "id": "sEHes5FQXbbP"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from psutil import virtual_memory\n",
        "ram_gb = virtual_memory().total / 1e9\n",
        "print('Your runtime has {:.1f} gigabytes of available RAM\\n'.format(ram_gb))\n",
        "\n",
        "if ram_gb < 20:\n",
        "  print('Not using a high-RAM runtime')\n",
        "else:\n",
        "  print('You are using a high-RAM runtime!')"
      ],
      "metadata": {
        "id": "pVHkIfjHXeIw",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "73b9a947-708e-412b-b2ce-db16aad6866f"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Your runtime has 54.8 gigabytes of available RAM\n",
            "\n",
            "You are using a high-RAM runtime!\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## 3. Mounting Google Drive:\n",
        "The code mounts the user's Google Drive to save trained checkpoints and load the dataset."
      ],
      "metadata": {
        "id": "Agpune8JjsQX"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from google.colab import drive\n",
        "drive.mount('/content/gdrive')"
      ],
      "metadata": {
        "id": "xFt4umCqjlS0",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "920213cc-3e86-46b5-fcf3-09b82686dc9d"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Mounted at /content/gdrive\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## 4. Installing requirements:\n",
        "It clones the DLAS repository, downloads pre-trained model checkpoints, and installs the required dependencies."
      ],
      "metadata": {
        "id": "sr_N55tZjFWg"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "EqAlGZg0hAw3",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 1000
        },
        "outputId": "f1205f06-621b-41ac-f759-c50e2608a7d4"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Cloning into 'DL-Art-School'...\n",
            "remote: Enumerating objects: 16612, done.\u001b[K\n",
            "remote: Total 16612 (delta 0), reused 0 (delta 0), pack-reused 16612\u001b[K\n",
            "Receiving objects: 100% (16612/16612), 12.15 MiB | 18.79 MiB/s, done.\n",
            "Resolving deltas: 100% (13325/13325), done.\n",
            "/content/DL-Art-School\n",
            "--2024-04-11 22:56:15--  https://huggingface.co/Gatozu35/tortoise-tts/resolve/main/dvae.pth\n",
            "Resolving huggingface.co (huggingface.co)... 13.35.7.81, 13.35.7.5, 13.35.7.57, ...\n",
            "Connecting to huggingface.co (huggingface.co)|13.35.7.81|:443... connected.\n",
            "HTTP request sent, awaiting response... 302 Found\n",
            "Location: https://cdn-lfs.huggingface.co/repos/9f/83/9f8300e5ccd418d283e72c07d1851f269dbd2ca7bae49cee21285965d9bebe14/a990825371506c16bcf0e8167bf24ccf82f65bb6a1dbcbfcf058d76f9b197e35?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27dvae.pth%3B+filename%3D%22dvae.pth%22%3B&Expires=1713135376&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcxMzEzNTM3Nn19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy85Zi84My85ZjgzMDBlNWNjZDQxOGQyODNlNzJjMDdkMTg1MWYyNjlkYmQyY2E3YmFlNDljZWUyMTI4NTk2NWQ5YmViZTE0L2E5OTA4MjUzNzE1MDZjMTZiY2YwZTgxNjdiZjI0Y2NmODJmNjViYjZhMWRiY2JmY2YwNThkNzZmOWIxOTdlMzU%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qIn1dfQ__&Signature=S4HBBsbrFqX5sWsHT12AaIpP516Ht0eqNkfbLII-zzZlOl-ybKbwaHV1Vrfa3pt-fSOcsKzTsyyEfYtuFb6imA6ZxtXuwKu0Hda122nxCFyuKqOui5saQHdknzMkTHnLv5vstT%7EFzyBsxpDjhD9AjloDOe7fOVNzcGbh1XgjX2U06FkKbH1RaDXBs0KhR62vTXxtPG%7ERKn5mQ5UmuytyztVxXQl%7EJZroJpLMGFMK1N6bju4M2oDKwJeTouE9tRolHjr2QzmE3RvI1WHyWF9TuQnGPnFdfKqe9C37Jfb1RvOmHzMqMg2nccEk9AJKjsJ%7ETWxIh%7EgDW2ku2KT1ZUOzvw__&Key-Pair-Id=KVTP0A1DKRTAX [following]\n",
            "--2024-04-11 22:56:16--  https://cdn-lfs.huggingface.co/repos/9f/83/9f8300e5ccd418d283e72c07d1851f269dbd2ca7bae49cee21285965d9bebe14/a990825371506c16bcf0e8167bf24ccf82f65bb6a1dbcbfcf058d76f9b197e35?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27dvae.pth%3B+filename%3D%22dvae.pth%22%3B&Expires=1713135376&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcxMzEzNTM3Nn19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy85Zi84My85ZjgzMDBlNWNjZDQxOGQyODNlNzJjMDdkMTg1MWYyNjlkYmQyY2E3YmFlNDljZWUyMTI4NTk2NWQ5YmViZTE0L2E5OTA4MjUzNzE1MDZjMTZiY2YwZTgxNjdiZjI0Y2NmODJmNjViYjZhMWRiY2JmY2YwNThkNzZmOWIxOTdlMzU%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qIn1dfQ__&Signature=S4HBBsbrFqX5sWsHT12AaIpP516Ht0eqNkfbLII-zzZlOl-ybKbwaHV1Vrfa3pt-fSOcsKzTsyyEfYtuFb6imA6ZxtXuwKu0Hda122nxCFyuKqOui5saQHdknzMkTHnLv5vstT%7EFzyBsxpDjhD9AjloDOe7fOVNzcGbh1XgjX2U06FkKbH1RaDXBs0KhR62vTXxtPG%7ERKn5mQ5UmuytyztVxXQl%7EJZroJpLMGFMK1N6bju4M2oDKwJeTouE9tRolHjr2QzmE3RvI1WHyWF9TuQnGPnFdfKqe9C37Jfb1RvOmHzMqMg2nccEk9AJKjsJ%7ETWxIh%7EgDW2ku2KT1ZUOzvw__&Key-Pair-Id=KVTP0A1DKRTAX\n",
            "Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 13.35.7.14, 13.35.7.113, 13.35.7.93, ...\n",
            "Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|13.35.7.14|:443... connected.\n",
            "HTTP request sent, awaiting response... 200 OK\n",
            "Length: 239873286 (229M) [binary/octet-stream]\n",
            "Saving to: ‘experiments/dvae.pth’\n",
            "\n",
            "experiments/dvae.pt 100%[===================>] 228.76M  19.6MB/s    in 13s     \n",
            "\n",
            "2024-04-11 22:56:30 (17.2 MB/s) - ‘experiments/dvae.pth’ saved [239873286/239873286]\n",
            "\n",
            "--2024-04-11 22:56:30--  https://huggingface.co/jbetker/tortoise-tts-v2/resolve/main/.models/autoregressive.pth\n",
            "Resolving huggingface.co (huggingface.co)... 13.35.7.81, 13.35.7.5, 13.35.7.57, ...\n",
            "Connecting to huggingface.co (huggingface.co)|13.35.7.81|:443... connected.\n",
            "HTTP request sent, awaiting response... 302 Found\n",
            "Location: https://cdn-lfs.huggingface.co/repos/57/08/5708d35b6a2408439b3511ccdaa992754f65f8e182278ecb9f654fc46a46f9fb/9c6651b9996df6cef6a1fc459738ae207ab60f902ec49b4d0623ca8ab6110d51?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27autoregressive.pth%3B+filename%3D%22autoregressive.pth%22%3B&Expires=1713135390&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcxMzEzNTM5MH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy81Ny8wOC81NzA4ZDM1YjZhMjQwODQzOWIzNTExY2NkYWE5OTI3NTRmNjVmOGUxODIyNzhlY2I5ZjY1NGZjNDZhNDZmOWZiLzljNjY1MWI5OTk2ZGY2Y2VmNmExZmM0NTk3MzhhZTIwN2FiNjBmOTAyZWM0OWI0ZDA2MjNjYThhYjYxMTBkNTE%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qIn1dfQ__&Signature=M4tMVMr1Ts4scO8cohIRyTXgKU0wHzcCQUfyFuhsg9vCUQtWkSsa6fQsfDSuRcNqtPaCKbJuec3NS3oz2uIxW1AfMaY-UUYnP8iwjJygwEB9jLLsF92OENwGKOLeLvF8WorUgui7ecB638skd4pCMMhqQ-zs1Vhp%7EhRNeEC12cHp1bd24AMXmehM4qheor%7EtkRv0HjPCsGZSbPdtU7hNp6KCP5Q6OZuESCvyENA7vbZyhr4bz-TzFGlq%7E%7Emog9hpj8bCSzL4dJqH0pBBb0dSWCEEdDGwL3d%7EQilHb2XsxwVteDjbYxZjKhPXlrN70ohHlxHYpDWIlpOvhKznyvKzbQ__&Key-Pair-Id=KVTP0A1DKRTAX [following]\n",
            "--2024-04-11 22:56:30--  https://cdn-lfs.huggingface.co/repos/57/08/5708d35b6a2408439b3511ccdaa992754f65f8e182278ecb9f654fc46a46f9fb/9c6651b9996df6cef6a1fc459738ae207ab60f902ec49b4d0623ca8ab6110d51?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27autoregressive.pth%3B+filename%3D%22autoregressive.pth%22%3B&Expires=1713135390&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcxMzEzNTM5MH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy81Ny8wOC81NzA4ZDM1YjZhMjQwODQzOWIzNTExY2NkYWE5OTI3NTRmNjVmOGUxODIyNzhlY2I5ZjY1NGZjNDZhNDZmOWZiLzljNjY1MWI5OTk2ZGY2Y2VmNmExZmM0NTk3MzhhZTIwN2FiNjBmOTAyZWM0OWI0ZDA2MjNjYThhYjYxMTBkNTE%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qIn1dfQ__&Signature=M4tMVMr1Ts4scO8cohIRyTXgKU0wHzcCQUfyFuhsg9vCUQtWkSsa6fQsfDSuRcNqtPaCKbJuec3NS3oz2uIxW1AfMaY-UUYnP8iwjJygwEB9jLLsF92OENwGKOLeLvF8WorUgui7ecB638skd4pCMMhqQ-zs1Vhp%7EhRNeEC12cHp1bd24AMXmehM4qheor%7EtkRv0HjPCsGZSbPdtU7hNp6KCP5Q6OZuESCvyENA7vbZyhr4bz-TzFGlq%7E%7Emog9hpj8bCSzL4dJqH0pBBb0dSWCEEdDGwL3d%7EQilHb2XsxwVteDjbYxZjKhPXlrN70ohHlxHYpDWIlpOvhKznyvKzbQ__&Key-Pair-Id=KVTP0A1DKRTAX\n",
            "Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 13.35.7.14, 13.35.7.113, 13.35.7.93, ...\n",
            "Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|13.35.7.14|:443... connected.\n",
            "HTTP request sent, awaiting response... 200 OK\n",
            "Length: 1716988501 (1.6G) [application/zip]\n",
            "Saving to: ‘experiments/autoregressive.pth’\n",
            "\n",
            "experiments/autoreg 100%[===================>]   1.60G  76.0MB/s    in 21s     \n",
            "\n",
            "2024-04-11 22:56:52 (77.3 MB/s) - ‘experiments/autoregressive.pth’ saved [1716988501/1716988501]\n",
            "\n",
            "Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from -r codes/requirements.laxed.txt (line 2)) (1.25.2)\n",
            "Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from -r codes/requirements.laxed.txt (line 3)) (6.0.1)\n",
            "Collecting tb-nightly (from -r codes/requirements.laxed.txt (line 4))\n",
            "  Downloading tb_nightly-2.17.0a20240411-py3-none-any.whl (5.5 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m5.5/5.5 MB\u001b[0m \u001b[31m12.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: future in /usr/local/lib/python3.10/dist-packages (from -r codes/requirements.laxed.txt (line 5)) (0.18.3)\n",
            "Collecting scp (from -r codes/requirements.laxed.txt (line 6))\n",
            "  Downloading scp-0.14.5-py2.py3-none-any.whl (8.7 kB)\n",
            "Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from -r codes/requirements.laxed.txt (line 7)) (4.66.2)\n",
            "Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from -r codes/requirements.laxed.txt (line 8)) (3.7.1)\n",
            "Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (from -r codes/requirements.laxed.txt (line 9)) (1.11.4)\n",
            "Collecting munch (from -r codes/requirements.laxed.txt (line 10))\n",
            "  Downloading munch-4.0.0-py2.py3-none-any.whl (9.9 kB)\n",
            "Requirement already satisfied: tensorboard in /usr/local/lib/python3.10/dist-packages (from -r codes/requirements.laxed.txt (line 13)) (2.15.2)\n",
            "Collecting orjson (from -r codes/requirements.laxed.txt (line 14))\n",
            "  Downloading orjson-3.10.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (144 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m144.8/144.8 kB\u001b[0m \u001b[31m20.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting einops (from -r codes/requirements.laxed.txt (line 15))\n",
            "  Downloading einops-0.7.0-py3-none-any.whl (44 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m44.6/44.6 kB\u001b[0m \u001b[31m6.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting lambda-networks (from -r codes/requirements.laxed.txt (line 16))\n",
            "  Downloading lambda_networks-0.4.0-py3-none-any.whl (5.0 kB)\n",
            "Collecting mup (from -r codes/requirements.laxed.txt (line 17))\n",
            "  Downloading mup-1.0.0.tar.gz (28 kB)\n",
            "  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "Collecting customtkinter (from -r codes/requirements.laxed.txt (line 20))\n",
            "  Downloading customtkinter-5.2.2-py3-none-any.whl (296 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m296.1/296.1 kB\u001b[0m \u001b[31m22.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting ruamel.yaml (from -r codes/requirements.laxed.txt (line 21))\n",
            "  Downloading ruamel.yaml-0.18.6-py3-none-any.whl (117 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m117.8/117.8 kB\u001b[0m \u001b[31m195.3 kB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: opencv-python in /usr/local/lib/python3.10/dist-packages (from -r codes/requirements.laxed.txt (line 23)) (4.8.0.76)\n",
            "Collecting kornia (from -r codes/requirements.laxed.txt (line 24))\n",
            "  Downloading kornia-0.7.2-py2.py3-none-any.whl (825 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m825.4/825.4 kB\u001b[0m \u001b[31m23.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting pytorch_ssim (from -r codes/requirements.laxed.txt (line 25))\n",
            "  Downloading pytorch_ssim-0.1.tar.gz (1.4 kB)\n",
            "  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "Collecting gsa-pytorch (from -r codes/requirements.laxed.txt (line 26))\n",
            "  Downloading gsa_pytorch-0.2.2-py3-none-any.whl (3.6 kB)\n",
            "Collecting pytorch_fid (from -r codes/requirements.laxed.txt (line 27))\n",
            "  Downloading pytorch_fid-0.3.0-py3-none-any.whl (15 kB)\n",
            "Requirement already satisfied: inflect in /usr/local/lib/python3.10/dist-packages (from -r codes/requirements.laxed.txt (line 30)) (7.0.0)\n",
            "Requirement already satisfied: librosa in /usr/local/lib/python3.10/dist-packages (from -r codes/requirements.laxed.txt (line 31)) (0.10.1)\n",
            "Collecting Unidecode (from -r codes/requirements.laxed.txt (line 32))\n",
            "  Downloading Unidecode-1.3.8-py3-none-any.whl (235 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m235.5/235.5 kB\u001b[0m \u001b[31m24.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting tgt (from -r codes/requirements.laxed.txt (line 33))\n",
            "  Downloading tgt-1.5-py3-none-any.whl (41 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m41.6/41.6 kB\u001b[0m \u001b[31m6.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting pyworld (from -r codes/requirements.laxed.txt (line 34))\n",
            "  Downloading pyworld-0.3.4.tar.gz (251 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m252.0/252.0 kB\u001b[0m \u001b[31m27.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n",
            "  Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n",
            "  Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
            "Collecting audio2numpy (from -r codes/requirements.laxed.txt (line 35))\n",
            "  Downloading audio2numpy-0.1.2-py3-none-any.whl (10 kB)\n",
            "Requirement already satisfied: SoundFile in /usr/local/lib/python3.10/dist-packages (from -r codes/requirements.laxed.txt (line 36)) (0.12.1)\n",
            "Requirement already satisfied: transformers in /usr/local/lib/python3.10/dist-packages (from -r codes/requirements.laxed.txt (line 39)) (4.38.2)\n",
            "Requirement already satisfied: tokenizers in /usr/local/lib/python3.10/dist-packages (from -r codes/requirements.laxed.txt (line 40)) (0.15.2)\n",
            "Collecting jiwer (from -r codes/requirements.laxed.txt (line 41))\n",
            "  Downloading jiwer-3.0.3-py3-none-any.whl (21 kB)\n",
            "Collecting omegaconf (from -r codes/requirements.laxed.txt (line 42))\n",
            "  Downloading omegaconf-2.3.0-py3-none-any.whl (79 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m79.5/79.5 kB\u001b[0m \u001b[31m12.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting vector_quantize_pytorch (from -r codes/requirements.laxed.txt (line 45))\n",
            "  Downloading vector_quantize_pytorch-1.14.6-py3-none-any.whl (28 kB)\n",
            "Collecting linear_attention_transformer (from -r codes/requirements.laxed.txt (line 46))\n",
            "  Downloading linear_attention_transformer-0.19.1-py3-none-any.whl (12 kB)\n",
            "Collecting rotary-embedding-torch (from -r codes/requirements.laxed.txt (line 47))\n",
            "  Downloading rotary_embedding_torch-0.5.3-py3-none-any.whl (5.3 kB)\n",
            "Collecting axial_positional_embedding (from -r codes/requirements.laxed.txt (line 48))\n",
            "  Downloading axial_positional_embedding-0.2.1.tar.gz (2.6 kB)\n",
            "  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "Collecting g-mlp-pytorch (from -r codes/requirements.laxed.txt (line 49))\n",
            "  Downloading g_mlp_pytorch-0.1.5-py3-none-any.whl (5.9 kB)\n",
            "Collecting x-clip (from -r codes/requirements.laxed.txt (line 50))\n",
            "  Downloading x_clip-0.14.4-py3-none-any.whl (1.4 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.4/1.4 MB\u001b[0m \u001b[31m26.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting x_transformers==1.0.4 (from -r codes/requirements.laxed.txt (line 51))\n",
            "  Downloading x_transformers-1.0.4-py3-none-any.whl (16 kB)\n",
            "Collecting bitsandbytes (from -r codes/requirements.laxed.txt (line 54))\n",
            "  Downloading bitsandbytes-0.43.1-py3-none-manylinux_2_24_x86_64.whl (119.8 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m119.8/119.8 MB\u001b[0m \u001b[31m14.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting lion-pytorch==0.0.7 (from -r codes/requirements.laxed.txt (line 55))\n",
            "  Downloading lion_pytorch-0.0.7-py3-none-any.whl (4.3 kB)\n",
            "Requirement already satisfied: torch>=1.6 in /usr/local/lib/python3.10/dist-packages (from x_transformers==1.0.4->-r codes/requirements.laxed.txt (line 51)) (2.2.1+cu121)\n",
            "Requirement already satisfied: absl-py>=0.4 in /usr/local/lib/python3.10/dist-packages (from tb-nightly->-r codes/requirements.laxed.txt (line 4)) (1.4.0)\n",
            "Requirement already satisfied: grpcio>=1.48.2 in /usr/local/lib/python3.10/dist-packages (from tb-nightly->-r codes/requirements.laxed.txt (line 4)) (1.62.1)\n",
            "Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.10/dist-packages (from tb-nightly->-r codes/requirements.laxed.txt (line 4)) (3.6)\n",
            "Requirement already satisfied: protobuf!=4.24.0,<5.0.0,>=3.19.6 in /usr/local/lib/python3.10/dist-packages (from tb-nightly->-r codes/requirements.laxed.txt (line 4)) (3.20.3)\n",
            "Requirement already satisfied: setuptools>=41.0.0 in /usr/local/lib/python3.10/dist-packages (from tb-nightly->-r codes/requirements.laxed.txt (line 4)) (67.7.2)\n",
            "Requirement already satisfied: six>1.9 in /usr/local/lib/python3.10/dist-packages (from tb-nightly->-r codes/requirements.laxed.txt (line 4)) (1.16.0)\n",
            "Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /usr/local/lib/python3.10/dist-packages (from tb-nightly->-r codes/requirements.laxed.txt (line 4)) (0.7.2)\n",
            "Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from tb-nightly->-r codes/requirements.laxed.txt (line 4)) (3.0.2)\n",
            "Collecting paramiko (from scp->-r codes/requirements.laxed.txt (line 6))\n",
            "  Downloading paramiko-3.4.0-py3-none-any.whl (225 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m225.9/225.9 kB\u001b[0m \u001b[31m31.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->-r codes/requirements.laxed.txt (line 8)) (1.2.1)\n",
            "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->-r codes/requirements.laxed.txt (line 8)) (0.12.1)\n",
            "Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->-r codes/requirements.laxed.txt (line 8)) (4.51.0)\n",
            "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->-r codes/requirements.laxed.txt (line 8)) (1.4.5)\n",
            "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->-r codes/requirements.laxed.txt (line 8)) (24.0)\n",
            "Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->-r codes/requirements.laxed.txt (line 8)) (9.4.0)\n",
            "Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->-r codes/requirements.laxed.txt (line 8)) (3.1.2)\n",
            "Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib->-r codes/requirements.laxed.txt (line 8)) (2.8.2)\n",
            "Requirement already satisfied: google-auth<3,>=1.6.3 in /usr/local/lib/python3.10/dist-packages (from tensorboard->-r codes/requirements.laxed.txt (line 13)) (2.27.0)\n",
            "Requirement already satisfied: google-auth-oauthlib<2,>=0.5 in /usr/local/lib/python3.10/dist-packages (from tensorboard->-r codes/requirements.laxed.txt (line 13)) (1.2.0)\n",
            "Requirement already satisfied: requests<3,>=2.21.0 in /usr/local/lib/python3.10/dist-packages (from tensorboard->-r codes/requirements.laxed.txt (line 13)) (2.31.0)\n",
            "Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from mup->-r codes/requirements.laxed.txt (line 17)) (2.0.3)\n",
            "Requirement already satisfied: torchvision in /usr/local/lib/python3.10/dist-packages (from mup->-r codes/requirements.laxed.txt (line 17)) (0.17.1+cu121)\n",
            "Requirement already satisfied: seaborn in /usr/local/lib/python3.10/dist-packages (from mup->-r codes/requirements.laxed.txt (line 17)) (0.13.1)\n",
            "Collecting darkdetect (from customtkinter->-r codes/requirements.laxed.txt (line 20))\n",
            "  Downloading darkdetect-0.8.0-py3-none-any.whl (9.0 kB)\n",
            "Collecting ruamel.yaml.clib>=0.2.7 (from ruamel.yaml->-r codes/requirements.laxed.txt (line 21))\n",
            "  Downloading ruamel.yaml.clib-0.2.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (526 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m526.7/526.7 kB\u001b[0m \u001b[31m57.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting kornia-rs>=0.1.0 (from kornia->-r codes/requirements.laxed.txt (line 24))\n",
            "  Downloading kornia_rs-0.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.4/2.4 MB\u001b[0m \u001b[31m98.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: pydantic>=1.9.1 in /usr/local/lib/python3.10/dist-packages (from inflect->-r codes/requirements.laxed.txt (line 30)) (2.6.4)\n",
            "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from inflect->-r codes/requirements.laxed.txt (line 30)) (4.11.0)\n",
            "Requirement already satisfied: audioread>=2.1.9 in /usr/local/lib/python3.10/dist-packages (from librosa->-r codes/requirements.laxed.txt (line 31)) (3.0.1)\n",
            "Requirement already satisfied: scikit-learn>=0.20.0 in /usr/local/lib/python3.10/dist-packages (from librosa->-r codes/requirements.laxed.txt (line 31)) (1.2.2)\n",
            "Requirement already satisfied: joblib>=0.14 in /usr/local/lib/python3.10/dist-packages (from librosa->-r codes/requirements.laxed.txt (line 31)) (1.4.0)\n",
            "Requirement already satisfied: decorator>=4.3.0 in /usr/local/lib/python3.10/dist-packages (from librosa->-r codes/requirements.laxed.txt (line 31)) (4.4.2)\n",
            "Requirement already satisfied: numba>=0.51.0 in /usr/local/lib/python3.10/dist-packages (from librosa->-r codes/requirements.laxed.txt (line 31)) (0.58.1)\n",
            "Requirement already satisfied: pooch>=1.0 in /usr/local/lib/python3.10/dist-packages (from librosa->-r codes/requirements.laxed.txt (line 31)) (1.8.1)\n",
            "Requirement already satisfied: soxr>=0.3.2 in /usr/local/lib/python3.10/dist-packages (from librosa->-r codes/requirements.laxed.txt (line 31)) (0.3.7)\n",
            "Requirement already satisfied: lazy-loader>=0.1 in /usr/local/lib/python3.10/dist-packages (from librosa->-r codes/requirements.laxed.txt (line 31)) (0.4)\n",
            "Requirement already satisfied: msgpack>=1.0 in /usr/local/lib/python3.10/dist-packages (from librosa->-r codes/requirements.laxed.txt (line 31)) (1.0.8)\n",
            "Requirement already satisfied: cython>=0.24 in /usr/local/lib/python3.10/dist-packages (from pyworld->-r codes/requirements.laxed.txt (line 34)) (3.0.10)\n",
            "Collecting ffmpeg (from audio2numpy->-r codes/requirements.laxed.txt (line 35))\n",
            "  Downloading ffmpeg-1.4.tar.gz (5.1 kB)\n",
            "  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "Requirement already satisfied: cffi>=1.0 in /usr/local/lib/python3.10/dist-packages (from SoundFile->-r codes/requirements.laxed.txt (line 36)) (1.16.0)\n",
            "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers->-r codes/requirements.laxed.txt (line 39)) (3.13.4)\n",
            "Requirement already satisfied: huggingface-hub<1.0,>=0.19.3 in /usr/local/lib/python3.10/dist-packages (from transformers->-r codes/requirements.laxed.txt (line 39)) (0.20.3)\n",
            "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers->-r codes/requirements.laxed.txt (line 39)) (2023.12.25)\n",
            "Requirement already satisfied: safetensors>=0.4.1 in /usr/local/lib/python3.10/dist-packages (from transformers->-r codes/requirements.laxed.txt (line 39)) (0.4.2)\n",
            "Requirement already satisfied: click<9.0.0,>=8.1.3 in /usr/local/lib/python3.10/dist-packages (from jiwer->-r codes/requirements.laxed.txt (line 41)) (8.1.7)\n",
            "Collecting rapidfuzz<4,>=3 (from jiwer->-r codes/requirements.laxed.txt (line 41))\n",
            "  Downloading rapidfuzz-3.8.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.4 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.4/3.4 MB\u001b[0m \u001b[31m103.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting antlr4-python3-runtime==4.9.* (from omegaconf->-r codes/requirements.laxed.txt (line 42))\n",
            "  Downloading antlr4-python3-runtime-4.9.3.tar.gz (117 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m117.0/117.0 kB\u001b[0m \u001b[31m18.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "Collecting einx[torch]>=0.1.3 (from vector_quantize_pytorch->-r codes/requirements.laxed.txt (line 45))\n",
            "  Downloading einx-0.2.0.tar.gz (71 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m71.8/71.8 kB\u001b[0m \u001b[31m11.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "Collecting linformer>=0.1.0 (from linear_attention_transformer->-r codes/requirements.laxed.txt (line 46))\n",
            "  Downloading linformer-0.2.3-py3-none-any.whl (6.2 kB)\n",
            "Collecting local-attention (from linear_attention_transformer->-r codes/requirements.laxed.txt (line 46))\n",
            "  Downloading local_attention-1.9.0-py3-none-any.whl (8.2 kB)\n",
            "Collecting product-key-memory>=0.1.5 (from linear_attention_transformer->-r codes/requirements.laxed.txt (line 46))\n",
            "  Downloading product_key_memory-0.2.10-py3-none-any.whl (6.4 kB)\n",
            "Collecting beartype (from rotary-embedding-torch->-r codes/requirements.laxed.txt (line 47))\n",
            "  Downloading beartype-0.18.2-py3-none-any.whl (903 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m903.7/903.7 kB\u001b[0m \u001b[31m73.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting ftfy (from x-clip->-r codes/requirements.laxed.txt (line 50))\n",
            "  Downloading ftfy-6.2.0-py3-none-any.whl (54 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m54.4/54.4 kB\u001b[0m \u001b[31m8.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: pycparser in /usr/local/lib/python3.10/dist-packages (from cffi>=1.0->SoundFile->-r codes/requirements.laxed.txt (line 36)) (2.22)\n",
            "Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from einx[torch]>=0.1.3->vector_quantize_pytorch->-r codes/requirements.laxed.txt (line 45)) (1.12)\n",
            "Requirement already satisfied: frozendict in /usr/local/lib/python3.10/dist-packages (from einx[torch]>=0.1.3->vector_quantize_pytorch->-r codes/requirements.laxed.txt (line 45)) (2.4.1)\n",
            "Requirement already satisfied: cachetools<6.0,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from google-auth<3,>=1.6.3->tensorboard->-r codes/requirements.laxed.txt (line 13)) (5.3.3)\n",
            "Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.10/dist-packages (from google-auth<3,>=1.6.3->tensorboard->-r codes/requirements.laxed.txt (line 13)) (0.4.0)\n",
            "Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.10/dist-packages (from google-auth<3,>=1.6.3->tensorboard->-r codes/requirements.laxed.txt (line 13)) (4.9)\n",
            "Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.10/dist-packages (from google-auth-oauthlib<2,>=0.5->tensorboard->-r codes/requirements.laxed.txt (line 13)) (1.3.1)\n",
            "Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.19.3->transformers->-r codes/requirements.laxed.txt (line 39)) (2023.6.0)\n",
            "Requirement already satisfied: llvmlite<0.42,>=0.41.0dev0 in /usr/local/lib/python3.10/dist-packages (from numba>=0.51.0->librosa->-r codes/requirements.laxed.txt (line 31)) (0.41.1)\n",
            "Requirement already satisfied: platformdirs>=2.5.0 in /usr/local/lib/python3.10/dist-packages (from pooch>=1.0->librosa->-r codes/requirements.laxed.txt (line 31)) (4.2.0)\n",
            "Collecting colt5-attention>=0.10.14 (from product-key-memory>=0.1.5->linear_attention_transformer->-r codes/requirements.laxed.txt (line 46))\n",
            "  Downloading CoLT5_attention-0.10.20-py3-none-any.whl (18 kB)\n",
            "Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic>=1.9.1->inflect->-r codes/requirements.laxed.txt (line 30)) (0.6.0)\n",
            "Requirement already satisfied: pydantic-core==2.16.3 in /usr/local/lib/python3.10/dist-packages (from pydantic>=1.9.1->inflect->-r codes/requirements.laxed.txt (line 30)) (2.16.3)\n",
            "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.21.0->tensorboard->-r codes/requirements.laxed.txt (line 13)) (3.3.2)\n",
            "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.21.0->tensorboard->-r codes/requirements.laxed.txt (line 13)) (3.6)\n",
            "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.21.0->tensorboard->-r codes/requirements.laxed.txt (line 13)) (2.0.7)\n",
            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.21.0->tensorboard->-r codes/requirements.laxed.txt (line 13)) (2024.2.2)\n",
            "Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=0.20.0->librosa->-r codes/requirements.laxed.txt (line 31)) (3.4.0)\n",
            "Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.6->x_transformers==1.0.4->-r codes/requirements.laxed.txt (line 51)) (3.3)\n",
            "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=1.6->x_transformers==1.0.4->-r codes/requirements.laxed.txt (line 51)) (3.1.3)\n",
            "Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.6->x_transformers==1.0.4->-r codes/requirements.laxed.txt (line 51))\n",
            "  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)\n",
            "Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.6->x_transformers==1.0.4->-r codes/requirements.laxed.txt (line 51))\n",
            "  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)\n",
            "Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1.6->x_transformers==1.0.4->-r codes/requirements.laxed.txt (line 51))\n",
            "  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)\n",
            "Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch>=1.6->x_transformers==1.0.4->-r codes/requirements.laxed.txt (line 51))\n",
            "  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)\n",
            "Collecting nvidia-cublas-cu12==12.1.3.1 (from torch>=1.6->x_transformers==1.0.4->-r codes/requirements.laxed.txt (line 51))\n",
            "  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)\n",
            "Collecting nvidia-cufft-cu12==11.0.2.54 (from torch>=1.6->x_transformers==1.0.4->-r codes/requirements.laxed.txt (line 51))\n",
            "  Using cached nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)\n",
            "Collecting nvidia-curand-cu12==10.3.2.106 (from torch>=1.6->x_transformers==1.0.4->-r codes/requirements.laxed.txt (line 51))\n",
            "  Using cached nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)\n",
            "Collecting nvidia-cusolver-cu12==11.4.5.107 (from torch>=1.6->x_transformers==1.0.4->-r codes/requirements.laxed.txt (line 51))\n",
            "  Using cached nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)\n",
            "Collecting nvidia-cusparse-cu12==12.1.0.106 (from torch>=1.6->x_transformers==1.0.4->-r codes/requirements.laxed.txt (line 51))\n",
            "  Using cached nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)\n",
            "Collecting nvidia-nccl-cu12==2.19.3 (from torch>=1.6->x_transformers==1.0.4->-r codes/requirements.laxed.txt (line 51))\n",
            "  Using cached nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl (166.0 MB)\n",
            "Collecting nvidia-nvtx-cu12==12.1.105 (from torch>=1.6->x_transformers==1.0.4->-r codes/requirements.laxed.txt (line 51))\n",
            "  Using cached nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)\n",
            "Requirement already satisfied: triton==2.2.0 in /usr/local/lib/python3.10/dist-packages (from torch>=1.6->x_transformers==1.0.4->-r codes/requirements.laxed.txt (line 51)) (2.2.0)\n",
            "Collecting nvidia-nvjitlink-cu12 (from nvidia-cusolver-cu12==11.4.5.107->torch>=1.6->x_transformers==1.0.4->-r codes/requirements.laxed.txt (line 51))\n",
            "  Using cached nvidia_nvjitlink_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (21.1 MB)\n",
            "Requirement already satisfied: MarkupSafe>=2.1.1 in /usr/local/lib/python3.10/dist-packages (from werkzeug>=1.0.1->tb-nightly->-r codes/requirements.laxed.txt (line 4)) (2.1.5)\n",
            "Requirement already satisfied: wcwidth<0.3.0,>=0.2.12 in /usr/local/lib/python3.10/dist-packages (from ftfy->x-clip->-r codes/requirements.laxed.txt (line 50)) (0.2.13)\n",
            "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->mup->-r codes/requirements.laxed.txt (line 17)) (2023.4)\n",
            "Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas->mup->-r codes/requirements.laxed.txt (line 17)) (2024.1)\n",
            "Collecting bcrypt>=3.2 (from paramiko->scp->-r codes/requirements.laxed.txt (line 6))\n",
            "  Downloading bcrypt-4.1.2-cp39-abi3-manylinux_2_28_x86_64.whl (698 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m698.9/698.9 kB\u001b[0m \u001b[31m62.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: cryptography>=3.3 in /usr/local/lib/python3.10/dist-packages (from paramiko->scp->-r codes/requirements.laxed.txt (line 6)) (42.0.5)\n",
            "Collecting pynacl>=1.5 (from paramiko->scp->-r codes/requirements.laxed.txt (line 6))\n",
            "  Downloading PyNaCl-1.5.0-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (856 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m856.7/856.7 kB\u001b[0m \u001b[31m74.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: pyasn1<0.7.0,>=0.4.6 in /usr/local/lib/python3.10/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard->-r codes/requirements.laxed.txt (line 13)) (0.6.0)\n",
            "Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.10/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<2,>=0.5->tensorboard->-r codes/requirements.laxed.txt (line 13)) (3.2.2)\n",
            "Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->einx[torch]>=0.1.3->vector_quantize_pytorch->-r codes/requirements.laxed.txt (line 45)) (1.3.0)\n",
            "Building wheels for collected packages: mup, pytorch_ssim, pyworld, antlr4-python3-runtime, axial_positional_embedding, ffmpeg, einx\n",
            "  Building wheel for mup (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for mup: filename=mup-1.0.0-py3-none-any.whl size=23630 sha256=a10a92a1590b6c7c7f1996bd02a4ff3d009725168609453c3731d75a60116190\n",
            "  Stored in directory: /root/.cache/pip/wheels/f4/c8/88/3c23a3d10c50053b6552d2d30aee5b53ba89a47f742420036c\n",
            "  Building wheel for pytorch_ssim (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for pytorch_ssim: filename=pytorch_ssim-0.1-py3-none-any.whl size=2003 sha256=0d8fa581c3be5a02613d92543dc82b1d7b3b289824251a8afe6426952e5d7188\n",
            "  Stored in directory: /root/.cache/pip/wheels/2e/0c/10/4a3f91bd610b23196f1e28f8af80b3ec86786b50f3e86dc21e\n",
            "  Building wheel for pyworld (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for pyworld: filename=pyworld-0.3.4-cp310-cp310-linux_x86_64.whl size=865338 sha256=25e09ac38f1caf5a13b3882bc021f2d98bad605dc2fd04064c4636e35004a4a8\n",
            "  Stored in directory: /root/.cache/pip/wheels/66/09/8a/a1d79b73d59756f66e9bfe55a199840efc7473adb76ddacdfd\n",
            "  Building wheel for antlr4-python3-runtime (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for antlr4-python3-runtime: filename=antlr4_python3_runtime-4.9.3-py3-none-any.whl size=144554 sha256=d3042ab3983cad83bdb262deb1b449721e13d5629ef10c7c8356a3b7430abaef\n",
            "  Stored in directory: /root/.cache/pip/wheels/12/93/dd/1f6a127edc45659556564c5730f6d4e300888f4bca2d4c5a88\n",
            "  Building wheel for axial_positional_embedding (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for axial_positional_embedding: filename=axial_positional_embedding-0.2.1-py3-none-any.whl size=2882 sha256=33a89f792afba5bc5d45fbdfdd345afde8e50fdce531313812df37b0af11e8a8\n",
            "  Stored in directory: /root/.cache/pip/wheels/b1/cb/39/7ce7ff2d2fd37cfe1fe7b3a3c43cf410632b2ad3b3f3986d73\n",
            "  Building wheel for ffmpeg (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for ffmpeg: filename=ffmpeg-1.4-py3-none-any.whl size=6082 sha256=4068f44fab957e3f9ce182d8387bd113b9037ed690455c180343ad417f5b43ea\n",
            "  Stored in directory: /root/.cache/pip/wheels/8e/7a/69/cd6aeb83b126a7f04cbe7c9d929028dc52a6e7d525ff56003a\n",
            "  Building wheel for einx (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for einx: filename=einx-0.2.0-py3-none-any.whl size=87998 sha256=cd13f1d53f6373089134ac30231c4a8a372d7a52f96827bbb90779ddcad2a951\n",
            "  Stored in directory: /root/.cache/pip/wheels/12/4a/44/a5ee7c1b3de3ff83d42be6383107533e069b48a326c6f28fbe\n",
            "Successfully built mup pytorch_ssim pyworld antlr4-python3-runtime axial_positional_embedding ffmpeg einx\n",
            "Installing collected packages: tgt, pytorch_ssim, ffmpeg, antlr4-python3-runtime, Unidecode, ruamel.yaml.clib, rapidfuzz, pyworld, orjson, omegaconf, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, munch, kornia-rs, ftfy, einops, darkdetect, beartype, bcrypt, audio2numpy, tb-nightly, ruamel.yaml, pynacl, nvidia-cusparse-cu12, nvidia-cudnn-cu12, jiwer, einx, customtkinter, paramiko, nvidia-cusolver-cu12, scp, x_transformers, rotary-embedding-torch, local-attention, lion-pytorch, linformer, lambda-networks, kornia, gsa-pytorch, g-mlp-pytorch, bitsandbytes, axial_positional_embedding, x-clip, vector_quantize_pytorch, pytorch_fid, mup, colt5-attention, product-key-memory, linear_attention_transformer\n",
            "Successfully installed Unidecode-1.3.8 antlr4-python3-runtime-4.9.3 audio2numpy-0.1.2 axial_positional_embedding-0.2.1 bcrypt-4.1.2 beartype-0.18.2 bitsandbytes-0.43.1 colt5-attention-0.10.20 customtkinter-5.2.2 darkdetect-0.8.0 einops-0.7.0 einx-0.2.0 ffmpeg-1.4 ftfy-6.2.0 g-mlp-pytorch-0.1.5 gsa-pytorch-0.2.2 jiwer-3.0.3 kornia-0.7.2 kornia-rs-0.1.3 lambda-networks-0.4.0 linear_attention_transformer-0.19.1 linformer-0.2.3 lion-pytorch-0.0.7 local-attention-1.9.0 munch-4.0.0 mup-1.0.0 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.19.3 nvidia-nvjitlink-cu12-12.4.127 nvidia-nvtx-cu12-12.1.105 omegaconf-2.3.0 orjson-3.10.0 paramiko-3.4.0 product-key-memory-0.2.10 pynacl-1.5.0 pytorch_fid-0.3.0 pytorch_ssim-0.1 pyworld-0.3.4 rapidfuzz-3.8.1 rotary-embedding-torch-0.5.3 ruamel.yaml-0.18.6 ruamel.yaml.clib-0.2.8 scp-0.14.5 tb-nightly-2.17.0a20240411 tgt-1.5 vector_quantize_pytorch-1.14.6 x-clip-0.14.4 x_transformers-1.0.4\n"
          ]
        },
        {
          "output_type": "display_data",
          "data": {
            "application/vnd.colab-display-data+json": {
              "pip_warning": {
                "packages": [
                  "pydevd_plugins"
                ]
              },
              "id": "e43db29af7a647868881a2d864d04cca"
            }
          },
          "metadata": {}
        }
      ],
      "source": [
        "!git clone https://github.com/josuebatista/DL-Art-School.git\n",
        "%cd DL-Art-School\n",
        "!wget https://huggingface.co/Gatozu35/tortoise-tts/resolve/main/dvae.pth -O experiments/dvae.pth\n",
        "!wget https://huggingface.co/jbetker/tortoise-tts-v2/resolve/main/.models/autoregressive.pth -O experiments/autoregressive.pth\n",
        "!pip install -r codes/requirements.laxed.txt"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# MUST RESTART SESSION BEFORE PROCEEDING\n",
        "The following packages were previously imported in this runtime:\n",
        "  \n",
        "  `[pydevd_plugins]`\n",
        "\n",
        "You must restart the runtime in order to use newly installed versions.\n",
        "\n",
        "Restarting will lose all runtime state, including local variables.\n",
        "\n",
        "In Google Colab, from the top menu, select `Runtime`, then `Restart session`.\n",
        "<img src=\"https://github.com/PacktPublishing/Learn-OpenAI-Whisper/raw/main/Chapter09/images/Restart_the_runtime_600x102.png\" width=600>\n",
        "\n",
        "\n",
        "## Check the integrity of the dVAE checkpoint\n",
        "You should see the following message when verified:\n",
        "\n",
        "`a990825371506c16bcf0e8167bf24ccf82f65bb6a1dbcbfcf058d76f9b197e35  ../DL-Art-School/experiments/dvae.pth`\n"
      ],
      "metadata": {
        "id": "6vZtdWXc4g9D"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "'''\n",
        "Downgrading Transformers to previous version.\n",
        "It seems that last release of transformers broke the model loader.\n",
        "'''\n",
        "!pip install transformers==4.29.2\n",
        "\n",
        "# Check the integrity of the dVAE checkpoint\n",
        "!sha256sum /content/DL-Art-School/experiments/dvae.pth | grep a990825371506c16bcf0e8167bf24ccf82f65bb6a1dbcbfcf058d76f9b197e35 || echo \"SOMETHING IS WRONG WITH THE CHECKPOINT; REPORT THIS AS A GITHUB ISSUE AND DO NOT PROCEED\""
      ],
      "metadata": {
        "id": "9b359b_FnIZz",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "a774b06b-b0b7-41a2-f279-429a6e2f9d8e"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Collecting transformers==4.29.2\n",
            "  Downloading transformers-4.29.2-py3-none-any.whl (7.1 MB)\n",
            "\u001b[?25l     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/7.1 MB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K     \u001b[91m╸\u001b[0m\u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.1/7.1 MB\u001b[0m \u001b[31m3.6 MB/s\u001b[0m eta \u001b[36m0:00:02\u001b[0m\r\u001b[2K     \u001b[91m━━━━━\u001b[0m\u001b[90m╺\u001b[0m\u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.9/7.1 MB\u001b[0m \u001b[31m13.8 MB/s\u001b[0m eta \u001b[36m0:00:01\u001b[0m\r\u001b[2K     \u001b[91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[90m╺\u001b[0m\u001b[90m━━━━━━━━━━\u001b[0m \u001b[32m5.2/7.1 MB\u001b[0m \u001b[31m50.9 MB/s\u001b[0m eta \u001b[36m0:00:01\u001b[0m\r\u001b[2K     \u001b[91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[91m╸\u001b[0m \u001b[32m7.1/7.1 MB\u001b[0m \u001b[31m63.0 MB/s\u001b[0m eta \u001b[36m0:00:01\u001b[0m\r\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.1/7.1 MB\u001b[0m \u001b[31m46.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers==4.29.2) (3.13.4)\n",
            "Requirement already satisfied: huggingface-hub<1.0,>=0.14.1 in /usr/local/lib/python3.10/dist-packages (from transformers==4.29.2) (0.20.3)\n",
            "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from transformers==4.29.2) (1.25.2)\n",
            "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from transformers==4.29.2) (24.0)\n",
            "Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from transformers==4.29.2) (6.0.1)\n",
            "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers==4.29.2) (2023.12.25)\n",
            "Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers==4.29.2) (2.31.0)\n",
            "Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers==4.29.2)\n",
            "  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.8/7.8 MB\u001b[0m \u001b[31m96.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers==4.29.2) (4.66.2)\n",
            "Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.14.1->transformers==4.29.2) (2023.6.0)\n",
            "Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.14.1->transformers==4.29.2) (4.11.0)\n",
            "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.29.2) (3.3.2)\n",
            "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.29.2) (3.6)\n",
            "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.29.2) (2.0.7)\n",
            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.29.2) (2024.2.2)\n",
            "Installing collected packages: tokenizers, transformers\n",
            "  Attempting uninstall: tokenizers\n",
            "    Found existing installation: tokenizers 0.15.2\n",
            "    Uninstalling tokenizers-0.15.2:\n",
            "      Successfully uninstalled tokenizers-0.15.2\n",
            "  Attempting uninstall: transformers\n",
            "    Found existing installation: transformers 4.38.2\n",
            "    Uninstalling transformers-4.38.2:\n",
            "      Successfully uninstalled transformers-4.38.2\n",
            "Successfully installed tokenizers-0.13.3 transformers-4.29.2\n",
            "a990825371506c16bcf0e8167bf24ccf82f65bb6a1dbcbfcf058d76f9b197e35  /content/DL-Art-School/experiments/dvae.pth\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## 5. Calculating hyperparameters:\n",
        "This section automatically calculates suggested hyperparameters for training based on the provided dataset sizes. It adjusts the batch sizes to minimize leftover samples in each epoch, calculates the number of steps per epoch, and determines the frequencies for learning rate decay, validation, and checkpoint saving. Hyperparameters are crucial as they directly control the training algorithm's behavior and significantly impact the model's performance.\n",
        "\n",
        "To find the path to `Dataset_Training_Path` and `ValidationDataset_Training_Path` click on Google Colab's `Files` and search for directory `output` where the DJ format datasets are stored from the previous step in notebook #2.\n",
        "\n",
        "<img src=\"https://github.com/PacktPublishing/Learn-OpenAI-Whisper/raw/main/Chapter09/images/ch09_3_Google_Colab_DJ_format_directory.JPG\" width=600>\n"
      ],
      "metadata": {
        "id": "BQTong_qj7sJ"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from pathlib import Path\n",
        "from math import ceil\n",
        "DEFAULT_TRAIN_BS = 64\n",
        "DEFAULT_VAL_BS = 32\n",
        "#@markdown # Hyperparameter calculation\n",
        "#@markdown Run this cell to obtain suggested parameters for training\n",
        "Dataset_Training_Path = \"/content/gdrive/MyDrive/output/Learn_OAI_Whisper_Sample_Audio01.mp3_2024_04_11-21_39/train.txt\" #@param {type:\"string\"}\n",
        "ValidationDataset_Training_Path = \"/content/gdrive/MyDrive/output/Learn_OAI_Whisper_Sample_Audio01.mp3_2024_04_11-21_39/valid.txt\" #@param {type:\"string\"}\n",
        "\n",
        "#@markdown ### **NOTE**: Dataset must be in the following format.\n",
        "\n",
        "#@markdown  `dataset/`\n",
        "#@markdown * ---├── `val.txt`\n",
        "#@markdown * ---├── `train.txt`\n",
        "#@markdown * ---├── `wavs/`\n",
        "\n",
        "#@markdown `wavs/` directory must contain `.wav` files.\n",
        "\n",
        "#@markdown  Example for `train.txt` and `val.txt`:\n",
        "\n",
        "#@markdown * `wavs/A.wav|Write the transcribed audio here.`\n",
        "\n",
        "#@markdown todo: actually check the dataset structure\n",
        "\n",
        "if Dataset_Training_Path == ValidationDataset_Training_Path:\n",
        "  print(\"WARNING: training dataset path == validation dataset path!!!\")\n",
        "  print(\"\\tThis is technically okay but will make all of the validation metrics useless. \")\n",
        "  print(\"it will also SUBSTANTIALLY slow down the rate of training, because validation datasets are supposed to be much smaller than training ones.\")\n",
        "\n",
        "def txt_file_lines(p: str) -> int:\n",
        "  return len(Path(p).read_text().strip().split('\\n'))\n",
        "training_samples = txt_file_lines(Dataset_Training_Path)\n",
        "val_samples = txt_file_lines(ValidationDataset_Training_Path)\n",
        "\n",
        "if training_samples < 128: print(\"WARNING: very small dataset! the smallest dataset tested thus far had ~200 samples.\")\n",
        "if val_samples < 20: print(\"WARNING: very small validation dataset! val batch size will be scaled down to account\")\n",
        "\n",
        "def div_spillover(n: int, bs: int) -> int: # returns new batch size\n",
        "  epoch_steps,remain = divmod(n,bs)\n",
        "  if epoch_steps*2 > bs: return bs # don't bother optimising this stuff if epoch_steps are high\n",
        "  if not remain: return bs # unlikely but still\n",
        "\n",
        "  if remain*2 < bs: # \"easier\" to get rid of remainder -- should increase bs\n",
        "    target_bs = n//epoch_steps\n",
        "  else: # easier to increase epoch_steps by 1 -- decrease bs\n",
        "    target_bs = n//(epoch_steps+1)\n",
        "  assert n%target_bs < epoch_steps+2 # should be very few extra\n",
        "  return target_bs\n",
        "\n",
        "if training_samples < DEFAULT_TRAIN_BS:\n",
        "  print(\"WARNING: dataset is smaller than a single batch. This will almost certainly perform poorly. Trying anyway\")\n",
        "  train_bs = training_samples\n",
        "else:\n",
        "  train_bs = div_spillover(training_samples, DEFAULT_TRAIN_BS)\n",
        "if val_samples < DEFAULT_VAL_BS:\n",
        "  val_bs = val_samples\n",
        "else:\n",
        "  val_bs = div_spillover(val_samples, DEFAULT_VAL_BS)\n",
        "\n",
        "steps_per_epoch = training_samples//train_bs\n",
        "lr_decay_epochs = [20, 40, 56, 72]\n",
        "lr_decay_steps = [steps_per_epoch * e for e in lr_decay_epochs]\n",
        "print_freq = min(100, max(20, steps_per_epoch))\n",
        "val_freq = save_checkpoint_freq = print_freq * 3\n",
        "\n",
        "print(\"===CALCULATED SETTINGS===\")\n",
        "print(f'{train_bs=} {val_bs=}')\n",
        "print(f'{val_freq=} {lr_decay_steps=}')\n",
        "print(f'{print_freq=} {save_checkpoint_freq=}')"
      ],
      "metadata": {
        "id": "lQQC93cjfmNd",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "38798f47-1d72-4f86-eb98-67481485465b"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "WARNING: very small dataset! the smallest dataset tested thus far had ~200 samples.\n",
            "WARNING: very small validation dataset! val batch size will be scaled down to account\n",
            "WARNING: dataset is smaller than a single batch. This will almost certainly perform poorly. Trying anyway\n",
            "===CALCULATED SETTINGS===\n",
            "train_bs=4 val_bs=1\n",
            "val_freq=60 lr_decay_steps=[20, 40, 56, 72]\n",
            "print_freq=20 save_checkpoint_freq=60\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## 6. Training settings:\n",
        "The purpose of this section is to allow us to customize the training settings according to their requirements and available resources. It provides flexibility in naming the experiment, specifying dataset names, enabling or disabling certain features, and overriding calculated settings. The code also includes notes and warnings to guide the user in making appropriate choices based on their system's storage and computational capabilities."
      ],
      "metadata": {
        "id": "AFLjQOYOtqTa"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "#@markdown ##_Settings for normal users:_\n",
        "Experiment_Name = \"Learn_OAI_Whisper_20240411_JRB\" #@param {type:\"string\"}\n",
        "Dataset_Training_Name= \"TestDataset\" #@param {type:\"string\"}\n",
        "ValidationDataset_Name = \"TestValidation\" # this seems to be useless??? @param {type:\"string\"}\n",
        "SaveTrainingStates = False # @param {type:\"boolean\"}\n",
        "Keep_Last_N_Checkpoints = 0 #@param {type:\"slider\", min:0, max:10, step:1}\n",
        "#@markdown * **NOTE**: 0 means \"keep all models saved\", which could potentially cause out-of-storage issues.\n",
        "#@markdown * Without training states, each model \"only\" takes up ~1.6GB. You should have ~50GB of free space to begin with.\n",
        "#@markdown * With training states, each model (pth+state) takes up ~4.9 GB; Colab will crash around ~10 undeleted checkpoints in this case.\n",
        "\n",
        "#@markdown ##_Other training parameters_\n",
        "Fp16 = False #@param {type:\"boolean\"}\n",
        "Use8bit = True #@param {type:\"boolean\"}\n",
        "#@markdown * **NOTE**: for some reason, fp16 does not seem to improve vram use when combined with 8bit [citation needed]. To be verified later...\n",
        "TrainingRate = \"1e-5\" #@param {type:\"string\"}\n",
        "TortoiseCompat = False #@param {type:\"boolean\"}\n",
        "\n",
        "#@markdown * **NOTE**: TortoiseCompat introduces some breaking changes to the training process. **If you want to reproduce older models**, disable this checkbox.\n",
        "\n",
        "#@markdown ##_Calculated settings_ override\n",
        "#@markdown #####Blank entries rely on the calculated defaults from the cell above.\n",
        "#@markdown ######**Leave them blank unless you want to adjust them manually**\n",
        "TrainBS = \"\" #@param {type:\"string\"}\n",
        "ValBS = \"\" #@param {type:\"string\"}\n",
        "ValFreq = \"\" #@param {type:\"string\"}\n",
        "LRDecaySteps = \"\" #@param {type:\"string\"}\n",
        "PrintFreq = \"\" #@param {type:\"string\"}\n",
        "SaveCheckpointFreq = \"\" #@param {type:\"string\"}\n",
        "\n",
        "def take(orig, override):\n",
        "  if override == \"\": return orig\n",
        "  return type(orig)(override)\n",
        "\n",
        "train_bs = take(train_bs, TrainBS)\n",
        "val_bs = take(val_bs, ValBS)\n",
        "val_freq = take(val_freq, ValFreq)\n",
        "lr_decay_steps = eval(LRDecaySteps) if LRDecaySteps else lr_decay_steps\n",
        "print_freq = take(print_freq, PrintFreq)\n",
        "save_checkpoint_freq = take(save_checkpoint_freq, SaveCheckpointFreq)\n",
        "assert len(lr_decay_steps) == 4\n",
        "gen_lr_steps = ', '.join(str(v) for v in lr_decay_steps)\n",
        "\n",
        "#@markdown #Run this cell after you finish editing the settings."
      ],
      "metadata": {
        "id": "ryVeEXfxqw3n"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## 7. Applying settings:\n",
        "The code applies the defined settings to a fresh YAML configuration file using `sed` commands."
      ],
      "metadata": {
        "id": "AutsfX4at6kD"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "%cd /content/DL-Art-School\n",
        "# !wget https://raw.githubusercontent.com/152334H/DL-Art-School/master/experiments/EXAMPLE_gpt.yml -O experiments/EXAMPLE_gpt.yml\n",
        "!wget https://raw.githubusercontent.com/josuebatista/DL-Art-School/master/experiments/EXAMPLE_gpt.yml -O experiments/EXAMPLE_gpt.yml\n",
        "\n",
        "import os\n",
        "%cd /content/DL-Art-School\n",
        "!sed -i 's/batch_size: 128/batch_size: '\"$train_bs\"'/g' ./experiments/EXAMPLE_gpt.yml\n",
        "!sed -i 's/batch_size: 64/batch_size: '\"$val_bs\"'/g' ./experiments/EXAMPLE_gpt.yml\n",
        "!sed -i 's/val_freq: 500/val_freq: '\"$val_freq\"'/g' ./experiments/EXAMPLE_gpt.yml\n",
        "!sed -i 's/500, 1000, 1400, 1800/'\"$gen_lr_steps\"'/g' ./experiments/EXAMPLE_gpt.yml\n",
        "!sed -i 's/print_freq: 100/print_freq: '\"$print_freq\"'/g' ./experiments/EXAMPLE_gpt.yml\n",
        "!sed -i 's/save_checkpoint_freq: 500/save_checkpoint_freq: '\"$save_checkpoint_freq\"'/g' ./experiments/EXAMPLE_gpt.yml\n",
        "\n",
        "!sed -i 's+CHANGEME_validation_dataset_name+'\"$ValidationDataset_Name\"'+g' ./experiments/EXAMPLE_gpt.yml\n",
        "!sed -i 's+CHANGEME_path_to_validation_dataset+'\"$ValidationDataset_Training_Path\"'+g' ./experiments/EXAMPLE_gpt.yml\n",
        "if(Fp16==True):\n",
        "  os.system(\"sed -i 's+fp16: false+fp16: true+g' ./experiments/EXAMPLE_gpt.yml\")\n",
        "!sed -i 's/use_8bit: true/use_8bit: '\"$Use8bit\"'/g' ./experiments/EXAMPLE_gpt.yml\n",
        "\n",
        "!sed -i 's/disable_state_saving: true/disable_state_saving: '\"$SaveTrainingStates\"'/g' ./experiments/EXAMPLE_gpt.yml\n",
        "!sed -i 's/tortoise_compat: True/tortoise_compat: '\"$TortoiseCompat\"'/g' ./experiments/EXAMPLE_gpt.yml\n",
        "!sed -i 's/number_of_checkpoints_to_save: 0/number_of_checkpoints_to_save: '\"$Keep_Last_N_Checkpoints\"'/g' ./experiments/EXAMPLE_gpt.yml\n",
        "\n",
        "\n",
        "!sed -i 's/CHANGEME_training_dataset_name/'\"$Dataset_Training_Name\"'/g' ./experiments/EXAMPLE_gpt.yml\n",
        "!sed -i 's/CHANGEME_your_experiment_name/'\"$Experiment_Name\"'/g' ./experiments/EXAMPLE_gpt.yml\n",
        "!sed -i 's+CHANGEME_path_to_training_dataset+'\"$Dataset_Training_Path\"'+g' ./experiments/EXAMPLE_gpt.yml\n",
        "\n",
        "\n",
        "if (not TrainingRate==\"1e-5\"):\n",
        "  os.system(\"sed -i 's+!!float 1e-5 # CHANGEME:+!!float '\" + TrainingRate + \"' #+g' ./experiments/EXAMPLE_gpt.yml\")\n",
        "\n",
        "# Print the contents of the file\n",
        "with open('./experiments/EXAMPLE_gpt.yml', 'r') as configfile:\n",
        "    print(configfile.read())"
      ],
      "metadata": {
        "id": "5zaScjLUt4HM"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## 8. Training:\n",
        "Finally, the code starts the training process by running the `train.py` script with the configured YAML file.\n",
        "\n",
        "Press the stop button for this cell when you are satisfied with the results, and have seen:\n",
        "\n",
        "`INFO:base:Saving models and training states.`\n",
        "\n",
        "If your training run saves many models, you might exceed the storage limits on the colab runtime. To prevent this, try to delete old checkpoints in /content/DL-Art-School/experiments/$Experiment_Name/(models|training_state)/* via the file explorer panel as the training runs. **Resuming training after a crash requires config editing,** so try to not let that happen.\n",
        "\n",
        "\n",
        "<img src=\"https://github.com/PacktPublishing/Learn-OpenAI-Whisper/raw/main/Chapter09/images/ch09_3_Google_Colab_DLAS_checkpoint_directory.png\">\n"
      ],
      "metadata": {
        "id": "Ay0tYbU8DUTo"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "%cd /content/DL-Art-School/codes\n",
        "\n",
        "!python3 train.py -opt ../experiments/EXAMPLE_gpt.yml"
      ],
      "metadata": {
        "id": "ju5yCN_m51r7",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "d98d52c7-14d7-471e-bae6-edb5006b6ebc"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "/content/DL-Art-School/codes\n",
            "Disabled distributed training.\n",
            "24-04-11 23:21:05.167 - INFO:   name: Learn_OAI_Whisper_20240411_JRB\n",
            "  model: extensibletrainer\n",
            "  scale: 1\n",
            "  gpu_ids: [0]\n",
            "  start_step: 0\n",
            "  checkpointing_enabled: True\n",
            "  fp16: False\n",
            "  use_8bit: True\n",
            "  wandb: False\n",
            "  use_tb_logger: True\n",
            "  datasets:[\n",
            "    train:[\n",
            "      name: TestDataset\n",
            "      n_workers: 8\n",
            "      batch_size: 4\n",
            "      mode: paired_voice_audio\n",
            "      path: /content/gdrive/MyDrive/output/Learn_OAI_Whisper_Sample_Audio01.mp3_2024_04_11-21_39/train.txt\n",
            "      fetcher_mode: ['lj']\n",
            "      phase: train\n",
            "      max_wav_length: 255995\n",
            "      max_text_length: 200\n",
            "      sample_rate: 22050\n",
            "      load_conditioning: True\n",
            "      num_conditioning_candidates: 2\n",
            "      conditioning_length: 44000\n",
            "      use_bpe_tokenizer: True\n",
            "      load_aligned_codes: False\n",
            "      data_type: img\n",
            "    ]\n",
            "    val:[\n",
            "      name: TestValidation\n",
            "      n_workers: 1\n",
            "      batch_size: 1\n",
            "      mode: paired_voice_audio\n",
            "      path: /content/gdrive/MyDrive/output/Learn_OAI_Whisper_Sample_Audio01.mp3_2024_04_11-21_39/valid.txt\n",
            "      fetcher_mode: ['lj']\n",
            "      phase: val\n",
            "      max_wav_length: 255995\n",
            "      max_text_length: 200\n",
            "      sample_rate: 22050\n",
            "      load_conditioning: True\n",
            "      num_conditioning_candidates: 2\n",
            "      conditioning_length: 44000\n",
            "      use_bpe_tokenizer: True\n",
            "      load_aligned_codes: False\n",
            "      data_type: img\n",
            "    ]\n",
            "  ]\n",
            "  steps:[\n",
            "    gpt_train:[\n",
            "      training: gpt\n",
            "      loss_log_buffer: 500\n",
            "      optimizer: adamw\n",
            "      optimizer_params:[\n",
            "        lr: 1e-05\n",
            "        triton: False\n",
            "        weight_decay: 0.01\n",
            "        beta1: 0.9\n",
            "        beta2: 0.96\n",
            "      ]\n",
            "      clip_grad_eps: 4\n",
            "      injectors:[\n",
            "        paired_to_mel:[\n",
            "          type: torch_mel_spectrogram\n",
            "          mel_norm_file: ../experiments/clips_mel_norms.pth\n",
            "          in: wav\n",
            "          out: paired_mel\n",
            "        ]\n",
            "        paired_cond_to_mel:[\n",
            "          type: for_each\n",
            "          subtype: torch_mel_spectrogram\n",
            "          mel_norm_file: ../experiments/clips_mel_norms.pth\n",
            "          in: conditioning\n",
            "          out: paired_conditioning_mel\n",
            "        ]\n",
            "        to_codes:[\n",
            "          type: discrete_token\n",
            "          in: paired_mel\n",
            "          out: paired_mel_codes\n",
            "          dvae_config: ../experiments/train_diffusion_vocoder_22k_level.yml\n",
            "        ]\n",
            "        paired_fwd_text:[\n",
            "          type: generator\n",
            "          generator: gpt\n",
            "          in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths']\n",
            "          out: ['loss_text_ce', 'loss_mel_ce', 'logits']\n",
            "        ]\n",
            "      ]\n",
            "      losses:[\n",
            "        text_ce:[\n",
            "          type: direct\n",
            "          weight: 0.01\n",
            "          key: loss_text_ce\n",
            "        ]\n",
            "        mel_ce:[\n",
            "          type: direct\n",
            "          weight: 1\n",
            "          key: loss_mel_ce\n",
            "        ]\n",
            "      ]\n",
            "    ]\n",
            "  ]\n",
            "  networks:[\n",
            "    gpt:[\n",
            "      type: generator\n",
            "      which_model_G: unified_voice2\n",
            "      kwargs:[\n",
            "        layers: 30\n",
            "        model_dim: 1024\n",
            "        heads: 16\n",
            "        max_text_tokens: 402\n",
            "        max_mel_tokens: 604\n",
            "        max_conditioning_inputs: 2\n",
            "        mel_length_compression: 1024\n",
            "        number_text_tokens: 256\n",
            "        number_mel_codes: 8194\n",
            "        start_mel_token: 8192\n",
            "        stop_mel_token: 8193\n",
            "        start_text_token: 255\n",
            "        train_solo_embeddings: False\n",
            "        use_mel_codes_as_input: True\n",
            "        checkpointing: True\n",
            "        tortoise_compat: False\n",
            "      ]\n",
            "    ]\n",
            "  ]\n",
            "  path:[\n",
            "    pretrain_model_gpt: ../experiments/autoregressive.pth\n",
            "    strict_load: True\n",
            "    root: /content/DL-Art-School\n",
            "    experiments_root: /content/DL-Art-School/experiments/Learn_OAI_Whisper_20240411_JRB\n",
            "    models: /content/DL-Art-School/experiments/Learn_OAI_Whisper_20240411_JRB/models\n",
            "    training_state: /content/DL-Art-School/experiments/Learn_OAI_Whisper_20240411_JRB/training_state\n",
            "    log: /content/DL-Art-School/experiments/Learn_OAI_Whisper_20240411_JRB\n",
            "    val_images: /content/DL-Art-School/experiments/Learn_OAI_Whisper_20240411_JRB/val_images\n",
            "  ]\n",
            "  train:[\n",
            "    niter: 50000\n",
            "    warmup_iter: -1\n",
            "    mega_batch_factor: 4\n",
            "    val_freq: 60\n",
            "    default_lr_scheme: MultiStepLR\n",
            "    gen_lr_steps: [20, 40, 56, 72]\n",
            "    lr_gamma: 0.5\n",
            "    ema_enabled: False\n",
            "  ]\n",
            "  eval:[\n",
            "    pure: True\n",
            "  ]\n",
            "  logger:[\n",
            "    print_freq: 20\n",
            "    save_checkpoint_freq: 60\n",
            "    visuals: ['gen', 'mel']\n",
            "    visual_debug_rate: 500\n",
            "    is_mel_spectrogram: True\n",
            "    disable_state_saving: False\n",
            "  ]\n",
            "  upgrades:[\n",
            "    number_of_checkpoints_to_save: 0\n",
            "    number_of_states_to_save: 0\n",
            "  ]\n",
            "  is_train: True\n",
            "  dist: False\n",
            "\n",
            "2024-04-11 23:21:05.501956: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
            "2024-04-11 23:21:05.501997: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
            "2024-04-11 23:21:05.503282: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
            "2024-04-11 23:21:06.443727: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
            "24-04-11 23:21:07.334 - INFO: Random seed: 4195\n",
            "24-04-11 23:21:14.199 - INFO: Number of training data elements: 4, iters: 1\n",
            "24-04-11 23:21:14.199 - INFO: Total epochs needed: 50000 for iters 50,000\n",
            "24-04-11 23:21:14.201 - INFO: Number of val images in [TestValidation]: 1\n",
            "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py:380: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`.\n",
            "  warnings.warn(\n",
            "Loading from ../experiments/dvae.pth\n",
            "24-04-11 23:21:23.723 - INFO: Network gpt structure: DataParallel, with parameters: 421,526,786\n",
            "24-04-11 23:21:23.723 - INFO: UnifiedVoice(\n",
            "  (conditioning_encoder): ConditioningEncoder(\n",
            "    (init): Conv1d(80, 1024, kernel_size=(1,), stride=(1,))\n",
            "    (attn): Sequential(\n",
            "      (0): AttentionBlock(\n",
            "        (norm): GroupNorm32(32, 1024, eps=1e-05, affine=True)\n",
            "        (qkv): Conv1d(1024, 3072, kernel_size=(1,), stride=(1,))\n",
            "        (attention): QKVAttentionLegacy()\n",
            "        (x_proj): Identity()\n",
            "        (proj_out): Conv1d(1024, 1024, kernel_size=(1,), stride=(1,))\n",
            "      )\n",
            "      (1): AttentionBlock(\n",
            "        (norm): GroupNorm32(32, 1024, eps=1e-05, affine=True)\n",
            "        (qkv): Conv1d(1024, 3072, kernel_size=(1,), stride=(1,))\n",
            "        (attention): QKVAttentionLegacy()\n",
            "        (x_proj): Identity()\n",
            "        (proj_out): Conv1d(1024, 1024, kernel_size=(1,), stride=(1,))\n",
            "      )\n",
            "      (2): AttentionBlock(\n",
            "        (norm): GroupNorm32(32, 1024, eps=1e-05, affine=True)\n",
            "        (qkv): Conv1d(1024, 3072, kernel_size=(1,), stride=(1,))\n",
            "        (attention): QKVAttentionLegacy()\n",
            "        (x_proj): Identity()\n",
            "        (proj_out): Conv1d(1024, 1024, kernel_size=(1,), stride=(1,))\n",
            "      )\n",
            "      (3): AttentionBlock(\n",
            "        (norm): GroupNorm32(32, 1024, eps=1e-05, affine=True)\n",
            "        (qkv): Conv1d(1024, 3072, kernel_size=(1,), stride=(1,))\n",
            "        (attention): QKVAttentionLegacy()\n",
            "        (x_proj): Identity()\n",
            "        (proj_out): Conv1d(1024, 1024, kernel_size=(1,), stride=(1,))\n",
            "      )\n",
            "      (4): AttentionBlock(\n",
            "        (norm): GroupNorm32(32, 1024, eps=1e-05, affine=True)\n",
            "        (qkv): Conv1d(1024, 3072, kernel_size=(1,), stride=(1,))\n",
            "        (attention): QKVAttentionLegacy()\n",
            "        (x_proj): Identity()\n",
            "        (proj_out): Conv1d(1024, 1024, kernel_size=(1,), stride=(1,))\n",
            "      )\n",
            "      (5): AttentionBlock(\n",
            "        (norm): GroupNorm32(32, 1024, eps=1e-05, affine=True)\n",
            "        (qkv): Conv1d(1024, 3072, kernel_size=(1,), stride=(1,))\n",
            "        (attention): QKVAttentionLegacy()\n",
            "        (x_proj): Identity()\n",
            "        (proj_out): Conv1d(1024, 1024, kernel_size=(1,), stride=(1,))\n",
            "      )\n",
            "    )\n",
            "  )\n",
            "  (text_embedding): Embedding(256, 1024)\n",
            "  (mel_embedding): Embedding(8194, 1024)\n",
            "  (gpt): GPT2Model(\n",
            "    (drop): Dropout(p=0.1, inplace=False)\n",
            "    (h): ModuleList(\n",
            "      (0-29): 30 x GPT2Block(\n",
            "        (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)\n",
            "        (attn): GPT2Attention(\n",
            "          (c_attn): Conv1D()\n",
            "          (c_proj): Conv1D()\n",
            "          (attn_dropout): Dropout(p=0.1, inplace=False)\n",
            "          (resid_dropout): Dropout(p=0.1, inplace=False)\n",
            "        )\n",
            "        (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)\n",
            "        (mlp): GPT2MLP(\n",
            "          (c_fc): Conv1D()\n",
            "          (c_proj): Conv1D()\n",
            "          (act): NewGELUActivation()\n",
            "          (dropout): Dropout(p=0.1, inplace=False)\n",
            "        )\n",
            "      )\n",
            "    )\n",
            "    (ln_f): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)\n",
            "  )\n",
            "  (mel_pos_embedding): LearnedPositionEmbeddings(\n",
            "    (emb): Embedding(608, 1024)\n",
            "  )\n",
            "  (text_pos_embedding): LearnedPositionEmbeddings(\n",
            "    (emb): Embedding(404, 1024)\n",
            "  )\n",
            "  (final_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)\n",
            "  (text_head): Linear(in_features=1024, out_features=256, bias=True)\n",
            "  (mel_head): Linear(in_features=1024, out_features=8194, bias=True)\n",
            ")\n",
            "24-04-11 23:21:23.723 - INFO: Loading model for [../experiments/autoregressive.pth]\n",
            "24-04-11 23:21:25.137 - INFO: Start training from epoch: 0, iter: 0\n",
            "  0% 0/1 [00:00<?, ?it/s]/usr/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.\n",
            "  self.pid = os.fork()\n",
            "/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py:143: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\n",
            "  warnings.warn(\"Detected call of `lr_scheduler.step()` before `optimizer.step()`. \"\n",
            "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.\n",
            "  warnings.warn(\n",
            "100% 1/1 [00:06<00:00,  6.39s/it]\n",
            "100% 1/1 [00:01<00:00,  1.43s/it]\n",
            "100% 1/1 [00:01<00:00,  1.46s/it]\n",
            "100% 1/1 [00:01<00:00,  1.44s/it]\n",
            "100% 1/1 [00:01<00:00,  1.45s/it]\n",
            "100% 1/1 [00:01<00:00,  1.49s/it]\n",
            "100% 1/1 [00:01<00:00,  1.45s/it]\n",
            "100% 1/1 [00:01<00:00,  1.44s/it]\n",
            "100% 1/1 [00:01<00:00,  1.47s/it]\n",
            "100% 1/1 [00:01<00:00,  1.46s/it]\n",
            "100% 1/1 [00:01<00:00,  1.46s/it]\n",
            "100% 1/1 [00:01<00:00,  1.45s/it]\n",
            "100% 1/1 [00:01<00:00,  1.50s/it]\n",
            "100% 1/1 [00:01<00:00,  1.45s/it]\n",
            "100% 1/1 [00:01<00:00,  1.46s/it]\n",
            "100% 1/1 [00:01<00:00,  1.44s/it]\n",
            "100% 1/1 [00:01<00:00,  1.47s/it]\n",
            "100% 1/1 [00:01<00:00,  1.49s/it]\n",
            "100% 1/1 [00:01<00:00,  1.47s/it]\n",
            "  0% 0/1 [00:00<?, ?it/s]24-04-11 23:21:59.251 - INFO: [epoch: 19, iter:      20, lr:(5.000e-06,5.000e-06,)] step: 2.0000e+01 samples: 8.0000e+01 megasamples: 8.0000e-05 iteration_rate: 2.8831e-01 loss_text_ce: 6.1073e+00 loss_mel_ce: 5.0707e+00 loss_gpt_total: 5.1318e+00 grad_scaler_scale: 1.0000e+00 learning_rate_gpt_0: 5.0000e-06 learning_rate_gpt_1: 5.0000e-06 total_samples_loaded: 8.0000e+01 percent_skipped_samples: 0.0000e+00 percent_conditioning_is_self: 1.0000e+00 gpt_conditioning_encoder: 3.3424e+00 gpt_gpt: 2.6932e+00 gpt_heads: 1.3426e+00 \n",
            "100% 1/1 [00:01<00:00,  1.50s/it]\n",
            "100% 1/1 [00:01<00:00,  1.48s/it]\n",
            "100% 1/1 [00:01<00:00,  1.46s/it]\n",
            "100% 1/1 [00:01<00:00,  1.48s/it]\n",
            "100% 1/1 [00:01<00:00,  1.48s/it]\n",
            "100% 1/1 [00:01<00:00,  1.47s/it]\n",
            "100% 1/1 [00:01<00:00,  1.51s/it]\n",
            "100% 1/1 [00:01<00:00,  1.47s/it]\n",
            "100% 1/1 [00:01<00:00,  1.48s/it]\n",
            "100% 1/1 [00:01<00:00,  1.48s/it]\n",
            "100% 1/1 [00:01<00:00,  1.48s/it]\n",
            "100% 1/1 [00:01<00:00,  1.50s/it]\n",
            "100% 1/1 [00:01<00:00,  1.49s/it]\n",
            "100% 1/1 [00:01<00:00,  1.49s/it]\n",
            "100% 1/1 [00:01<00:00,  1.50s/it]\n",
            "100% 1/1 [00:01<00:00,  1.48s/it]\n",
            "100% 1/1 [00:01<00:00,  1.49s/it]\n",
            "100% 1/1 [00:01<00:00,  1.53s/it]\n",
            "100% 1/1 [00:01<00:00,  1.50s/it]\n",
            "100% 1/1 [00:01<00:00,  1.49s/it]\n",
            "  0% 0/1 [00:00<?, ?it/s]24-04-11 23:22:29.029 - INFO: [epoch: 39, iter:      40, lr:(2.500e-06,2.500e-06,)] step: 4.0000e+01 samples: 1.6000e+02 megasamples: 1.6000e-04 iteration_rate: 2.9139e-01 loss_text_ce: 6.0157e+00 loss_mel_ce: 4.3316e+00 loss_gpt_total: 4.3918e+00 grad_scaler_scale: 1.0000e+00 learning_rate_gpt_0: 2.5000e-06 learning_rate_gpt_1: 2.5000e-06 total_samples_loaded: 1.6000e+02 percent_skipped_samples: 0.0000e+00 percent_conditioning_is_self: 1.0000e+00 gpt_conditioning_encoder: 2.2077e+00 gpt_gpt: 2.3990e+00 gpt_heads: 1.3954e+00 \n",
            "100% 1/1 [00:01<00:00,  1.54s/it]\n",
            "100% 1/1 [00:01<00:00,  1.54s/it]\n",
            "100% 1/1 [00:01<00:00,  1.51s/it]\n",
            "100% 1/1 [00:01<00:00,  1.52s/it]\n",
            "100% 1/1 [00:01<00:00,  1.52s/it]\n",
            "100% 1/1 [00:01<00:00,  1.50s/it]\n",
            "100% 1/1 [00:01<00:00,  1.49s/it]\n",
            "100% 1/1 [00:01<00:00,  1.52s/it]\n",
            "100% 1/1 [00:01<00:00,  1.52s/it]\n",
            "100% 1/1 [00:01<00:00,  1.52s/it]\n",
            "100% 1/1 [00:01<00:00,  1.52s/it]\n",
            "100% 1/1 [00:01<00:00,  1.50s/it]\n",
            "100% 1/1 [00:01<00:00,  1.53s/it]\n",
            "100% 1/1 [00:01<00:00,  1.53s/it]\n",
            "100% 1/1 [00:01<00:00,  1.55s/it]\n",
            "100% 1/1 [00:01<00:00,  1.52s/it]\n",
            "100% 1/1 [00:01<00:00,  1.54s/it]\n",
            "100% 1/1 [00:01<00:00,  1.52s/it]\n",
            "100% 1/1 [00:01<00:00,  1.52s/it]\n",
            "100% 1/1 [00:01<00:00,  1.52s/it]\n",
            "  0% 0/1 [00:00<?, ?it/s]24-04-11 23:22:59.477 - INFO: [epoch: 59, iter:      60, lr:(1.250e-06,1.250e-06,)] step: 6.0000e+01 samples: 2.4000e+02 megasamples: 2.4000e-04 iteration_rate: 2.9822e-01 loss_text_ce: 5.9842e+00 loss_mel_ce: 3.8446e+00 loss_gpt_total: 3.9044e+00 grad_scaler_scale: 1.0000e+00 learning_rate_gpt_0: 1.2500e-06 learning_rate_gpt_1: 1.2500e-06 total_samples_loaded: 2.4000e+02 percent_skipped_samples: 0.0000e+00 percent_conditioning_is_self: 1.0000e+00 gpt_conditioning_encoder: 1.3167e+00 gpt_gpt: 2.3504e+00 gpt_heads: 1.3918e+00 \n",
            "24-04-11 23:22:59.478 - INFO: Saving models and training states.\n",
            "100% 1/1 [00:08<00:00,  8.40s/it]\n",
            "  0% 0/1 [00:00<?, ?it/s]\n",
            "  0% 0/1 [00:00<?, ?it/s]\u001b[A\n",
            "100% 1/1 [00:00<00:00,  1.52it/s]\n",
            ">>Eval loss_text_ce: 6.6796064376831055\n",
            ">>Eval loss_mel_ce: 6.462783336639404\n",
            "100% 1/1 [00:02<00:00,  2.30s/it]\n",
            "100% 1/1 [00:01<00:00,  1.48s/it]\n",
            "100% 1/1 [00:01<00:00,  1.53s/it]\n",
            "100% 1/1 [00:01<00:00,  1.55s/it]\n",
            "100% 1/1 [00:01<00:00,  1.56s/it]\n",
            "100% 1/1 [00:01<00:00,  1.54s/it]\n",
            "100% 1/1 [00:01<00:00,  1.53s/it]\n",
            "100% 1/1 [00:01<00:00,  1.57s/it]\n",
            "100% 1/1 [00:01<00:00,  1.56s/it]\n",
            "  0% 0/1 [00:01<?, ?it/s]\n",
            "Traceback (most recent call last):\n",
            "  File \"/content/DL-Art-School/codes/train.py\", line 399, in <module>\n",
            "    trainer.do_training()\n",
            "  File \"/content/DL-Art-School/codes/train.py\", line 354, in do_training\n",
            "    self.do_step(train_data)\n",
            "  File \"/content/DL-Art-School/codes/train.py\", line 212, in do_step\n",
            "    gradient_norms_dict = self.model.optimize_parameters(self.current_step, return_grad_norms=will_log)\n",
            "  File \"/content/DL-Art-School/codes/trainer/ExtensibleTrainer.py\", line 306, in optimize_parameters\n",
            "    ns = step.do_forward_backward(state, m, step_num, train=train_step, no_ddp_sync=(m+1 < self.batch_factor))\n",
            "  File \"/content/DL-Art-School/codes/trainer/steps.py\", line 309, in do_forward_backward\n",
            "    self.scaler.scale(total_loss).backward()\n",
            "  File \"/usr/local/lib/python3.10/dist-packages/torch/_tensor.py\", line 522, in backward\n",
            "    torch.autograd.backward(\n",
            "  File \"/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py\", line 266, in backward\n",
            "    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass\n",
            "KeyboardInterrupt\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## 9. Exporting to Google Drive:\n",
        "After training, the code provides an option to copy the experiment folder to the user's Google Drive for persistence."
      ],
      "metadata": {
        "id": "iyGVnx5jrnvV"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!cp -r /content/DL-Art-School/experiments/$Experiment_Name /content/gdrive/MyDrive/"
      ],
      "metadata": {
        "id": "BqydhU4Gwlv2"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "After running the cell, go to your Google Drive using a web browser, and you will see a directory with the same name as the value of the `Experiment_Name` variable.\n",
        "\n",
        "<img src=\"https://github.com/PacktPublishing/Learn-OpenAI-Whisper/raw/main/Chapter09/images/ch09_3_Google_drive_DLAS_checkpoint_directory.JPG\">\n",
        "\n",
        "Inside the `<Experiment_Name>/models` folder you will find the model checkpoints, they are the files with extension `.pth`\n",
        "\n",
        "<img src=\"https://github.com/PacktPublishing/Learn-OpenAI-Whisper/raw/main/Chapter09/images/ch09_3_Google_Colab_DLAS_checkpoint_pth_file.JPG\">\n",
        "\n",
        "That `.pth` file is the fine-tuned PVS model we just created with DLAS. In the next step, we will use that file to synthesize the voice using `TorToiSe-tts-fast`."
      ],
      "metadata": {
        "id": "kqBaNPGahd9j"
      }
    }
  ]
}